-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADD] add transformer_int4_fp16_loadlowbit_gpu_win
api
#11511
[ADD] add transformer_int4_fp16_loadlowbit_gpu_win
api
#11511
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the config here: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/all-in-one/README.md#config; and update the description here: https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/dev/benchmark/all-in-one/README.md#optional-save-model-in-low-bit
Let's also add |
model = AutoModelForCausalLM.load_low_bit(model_path+'-'+low_bit, optimize_model=True, trust_remote_code=True, | ||
torch_dtype=torch.float16, use_cache=True, cpu_embedding=cpu_embedding).eval() | ||
tokenizer = AutoTokenizer.from_pretrained(model_path+'-'+low_bit, trust_remote_code=True) | ||
model = model.to('xpu') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use model = model.half().to('xpu')
, and remove torch_dtype=torch.float16
for run_transformer_int4_fp16_loadlowbit_gpu_win
for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to the bug here: https://github.com/analytics-zoo/nano/issues/1489
transformer_int4_fp16_loadlowbit_gpu_win
api
…tics#11511) * [ADD] add transformer_int4_fp16_loadlowbit_gpu_win api * [UPDATE] add int4_fp16_lowbit config and description * [FIX] fix run.py mistake * [FIX] fix run.py mistake * [FIX] fix indent; change dtype=float16 to model.half()
Description
transformer_int4_fp16_loadlowbit_gpu_win
api for igpu-perf use